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SUMMARY 

It  la  shown  how  the  functional  equation  technique  of 
dynamic  programming  can  be  used  to  determine  the  optimal, 
second  best,  third  best,  etc.,  policies  for  various  deter¬ 
ministic  and  stochastic  multistage  decision  processes. 

This  is  of  importance  in  various  problems  in  combina¬ 
torial  analysis,  network  and  switching  theory,  feedback 
control,  and  sensitivity  analysis.  A  routing  problem  is 
discussed  in  some  detail. 
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ON  k-TH  DK3T  POLICIES 

Richard  Bellman 
Robert  Kalaba 


1.  Introduction 

In  recent  years,  a  good  deal  of  effort  has  oeen  devoted 
to  the  study  of  the  theory  of  multistage  decision  processes, 

r  1 

or  dynamic  programming,  see  1  .  The  emphasis  has  oeen  upon 

^  i 

analytic  determl nat ion  of  optimal  policies  and  upon  the 
numerical  determination  of  these  policies  and  the  associated 
return  functions  through  the  use  of  certain  algorithms  carried 
out  by  means  of  high  speed  computers. 

In  a  number  of  situations  where  only  a  finite  numoer  of 
possloLe  decisions  are  possible,  there  Is  no  question  as  to 
the  existence  of  an  optimal  policy.  However,  11'  the  number 
of  po 33lbi lltles  Is  large,  then  no  straight  forward  enumeration 
of  cases  Is  feasible,  and  one  Is  forced  to  develop  more  eiegant, 
If  less  simple,  techniques.  In  the  course  of  doing  this,  the 
question  arises  as  to  whether  or  not  it  is  possible  to  determine 
not  only  the  optimal  policy,  cut  the  next  best  policy,  and  so 
on,  l.e.,  the  preferred  suooptlmai  policies. 

Not  only  is  this  a  challenging  mathematical  question,  but 
as  we  shall  discuss  below,  It  has  significance  In  connection 
with  sensitivity  analysis"  and  a  variety  of  network  proolems. 

Before  considering  the  general  problem,  we  shall  discuss 
an  Interesting  particular  problem,  that  o'-  "optimal  I'outing." 
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2.  An  Optimal  Routing  Problem 

Consider  a  network,  plane  or  otherwise,  consisting  of  N 
nodes  and  Interconnecting  links.  Associate  with  any  two  nodes, 
the  i-th  and  J-th,  a  quantity,  tij»  which  we  can  for 
intuitive  purposes  call  the  time  required  to  travel  from  1 
to  J  along  the  connecting  link. 

It  is  Important  to  keep  in  mind  that  t^j  need  not  equal 
t.^,  and  that  some  of  these  quantities  need  not  be  finite. 

The  topological  meaning  of  this  last  comment  is  that  1  and  J 
need  not  be  connected.  Finally,  t^,  the  tlme  of  traverse, 
need  not  necessarily  be  proportional  to  the  actual  "physical 
distance"  between  the  nodes  1  and  J . 

If  we  think  of  node  i  as  the  i-th  state  of  a  system 
which  can  only  be  in  one  of  N  states,  and  if  t^  is  taken 
to  be  the  energy  required  to  transform  the  system  from  state 
i  to  state  J,  then  we  are  seeking  the  control  decisions  to 
be  made  in  order  to  bring  the  system  from  an  initial  state  i 
to  a  desired  terminal  state  N  with  minimal  expenditure  of 
energy.  This  is  a  fundamental  problem  of  automatic  control 


theory . 

The  problem  of  tracing  a  path  of  shortest  "time"  between 
two  given  points  of  the  network,  1  and  N,  has  been  considered 

by  a  number  of  authors.  Some  published  results  are  contained  in 

r  i  r  i  r  i  r  ' 

Minty  9  ,  Ford  6  ,  Dantzig  iU,5|,  and  Bellman  2  .  Bock, 

».  J  L  J  l  *  1-J 

r  1 

Kantner  and  Haynes  ^3j#  have  discussed  the  determination  of  the 

r 

k-th  shortest  path,  as  have  Hoffman  and  Pavley  .7  .  For  a 


general  discussion  of  this  topic  and  related  optimization 
problems,  see  Kalaba  6  . 
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Our  aim  here  Is  to  discuss  this  litter  problem  using  the 
functional  equation  technique  of  dynamic  programming. 

Although  the  original  question  Is  that  of  tracing  minimal 
paths  from  1  to  N,  we  imbed  this  problem  within  the  family 
of  problems  requiring  the  determlnatl on  of  minimal  paths  from 
a  generic  point  1  to  the  fixed  point  N.  This  apparent  com¬ 
plication  of  the  problem  enacles  us  to  employ  functional 
equations.  First  we  determine  shortest  paths,  then  second 
shortest . 

We  Introduce  the  sequence  of  quantities  ju^j,  where 

(2.1)  ui  =  the  time  required  to  go  from  1  to  N  using 

an  optimal  policy,  1  ■=  1,2,  ...,N  —  1, 

UN  =  °' 

Observe  now  that  If  the  Initial  point  Is  1  and  li  the 
initial  decision  Is  to  go  directly  from  1  to  J,  then  the 
remainder  of  the  ix)ute  must  certainly  oe  selected  to  minimize 

the  time  required  to  go  from  J  to  N.  This  Is  an  application 

r 

of  the  principle  of  optimality  . 

We  are  led  by  this  observation  to  the  system  of  equations 

(2.2)  u,  =  min  ( t ,  ,  +  u  ) ,  1  =  1,2, . . . , N  -  1, 

1  J/l  U  J 

UN  '  °' 

Although  these  equations  are  interlinked  In  such  a  fashion 

that  they  cannot  be  solved  recursively,  there  are  several  4uite 

1 

efficient  ways  of  obtaining  the  solution,  discussed  in  l2,B  . 


F-l^i? 

4 


Here  we  snail  merely  ooserve  that  If  we  define  the  new  sequence 

(  kl 

j  J  by  means  of  the  relatione 


(2.3) 


t'N*  1  ~  1,2, ... ,N  1, 


u 


0 


N 


0, 


and 


(2.4) 


u, 


k+1 


u 


k+1 


N 


min  (t  ■*-  u  k), 

J/l  1J  j 

0, 


1,2,. . .,N  -  1, 


k ) 

for  k  *  0,1,2,...,  then  the  sequence  >u^  (  will  converge  In 
a  monotonlcal ly  decreasing  fashion  to  the  sequence  iu^j  as 
k  — ♦  oo  .  In  actuality,  it  is  easy  to  see  that  k  need  never 
be  determined  beyong  the  value  N  —  2. 

Since  only  additions  and  comparisons  are  required,  and 
since  only  a  small  memory  is  needed,  this  is  a  feasible  com¬ 
puting  scheme  for  either  hand  or  machine  techniques. 

With  this  background,  let  us  treat  the  problem  of  deter¬ 
mining  a  second  shortest  path  from  i  to  N.  Let  us  introduce 
the  new  notation 


(2.5)  min^(x^,x  , ...,x^)  =»  the  k-th  smallest  value  of  the 

quantities  . 

This  function  is  not  defined  for  all  k:  for  example.  If  all 
the  x^  are  equal,  there  is  no  second  smallest  value. 

Let  us  introduce  the  quantities  v^  defined  as  follows: 
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(2.6)  =  the  second  shortest  path  length  from  1  to  N, 

1  =  1,2, ...  ,N  -  i.  If  It  exists  (v^  <  ) ; 

VN  *  °’ 

In  order  to  obtain  equations  connecting  the  various  members 
of  the  sequence  $v^.,  we  ar>gu®  a8  follows.  If  a  path  from  i 
to  N  is  to  be  second  shortest,  then  whatever  the  initial  choice, 
the  continuation  must  be  either  a  shortest  path  or  a  second 
shortest  path.  It  follows  that  v^  must  be  equal  to  one  of  the 
expressions 

(2.7)  min  ( L  +  u  ) , 

1J  J 

min  (t  -  v  ) . 
j/i1  U  J 


Since  It  must  equal  the  smaller  of  these,  we  ODtain  finally  the 
desired  relation 


(2.6) 


v^  =  min 


min  ( t  +  u  ) 
j/i2  U  J 

min  (t  +  v  ) 

LJ/i1  1J  J. 


^nce  the  sequence  iu^:  has  oeen  computed,  the  sequence 
^  can  be  determined  using  successive  approximations  in  the 
fashion  outlined  above. 


j.  General  Discrete  Deterministic  Processes 

Let  us  generalize  the  foregoing  considerations  by  con¬ 
sidering  dynamic  programming  processes  of  the  following  special 


type: 
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(3.1)  (a)  The  state  of  the  system  la  specified  Dy  a  finite 

dimensional  vector,  the  components  of  which  can 
assume  only  a  finite  set  of  values. 

(b)  At  each  stage,  we  have  a  choice  of  only  a  finite 
set  of  decisions. 

The  problem  of  determining  an  N-stage  policy  which  maxi¬ 
mizes  a  prescribed  function  of  the  final  state  is  then  of 
completely  finite  nature,  and  it  is  sensible  to  ask  not  only  for 
an  optimal  policy,  but  also  a  next  best  policy,  and  so  on.  We 
consider  all  policies  leading  to  a  maximum  return  as  optimal  or 
first  best,  all  policies  leading  to  a  return  that  is  less  then 
optimal  but  at  least  as  great  as  all  others  as  second  best,  and 
so  on.  Why  we  are  Interested  in  ordering  policies  will  be 
discussed  below  in  Section  6. 

Let  g(x)  denote  the  criterion  function  measuring  the 
value  of  the  final  state,  and  let  <T^(x)  denote  the  set  of 
allowable  decisions,  resulting  In  trans format ions  of  the  state 
.  0  the  system  at  each  stage.  Then,  i  f  we  introduce  the  function 

(3.2)  fN(x)  c  the  return  from  an  N-stage  process  obtained 

using  an  optimal  policy,  starting  with  a 
system  In  state  x,  N  »  1,2,..., 

we  obtain  in  the  usual  fashion  the  relations 

f  (x)  =  max  g( T. (x) ), 

1  1  1 

fN(x)  -  max  fN_i(T1(x)),  N  =  2,3, - 


(3-3) 
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Next,  let  us  Introduce  the  functions 

(3.^)  fN^k)(x)  -  the  return  from  an  N-stage  process  with  the 

system  Initially  in  state  x,  using  a  k-th 
best  policy.  A  k-th  best  policy  produces  a 
return  which  is  smaller  than  all  l-st,2-nd, 
...,(k  —  l)-8t  best  policies,  Dut  which  is 
at  least  as  great  as  the  return  produced  by 
all  other  policies. 


In  particular,  we  have 
(3-5)  rN(x)  -  fN(1,(x). 


Another  application  of  the  principle  of  optimality  leads  to  the 
relations 


(3-6)  fN*k)(x)  ■=  maxk^'^](T1(x)),r^](T1(x)),...,fJ^J(T1(x))j, 


N  =  2,3, • . . , 


(3.7)  fik(x)  -  maxk  g(T1(x)). 

It  follows  that  the  terms  in  the  sequence  f^^(x),  f^^'(x),  .  . . , 
may  then  be  determined  recursively,  for  suitable  ranges  of  N 
and  x.  At  the  same  time,  the  appropriate  decision  in  a  k-th 
best  policy  is  determined  in  terms  of  the  state  of  the  system  and 
the  time  remaining  before  termination  of  the  process. 

As  k  Increases,  the  dimensionality  of  the  problem 
increases.  The  memory  requirements  are  directly  proportional 


to  k . 
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4_. _ Paths  Most  Likely  to  be  Available  In  a  Network 

Once  again,  let  us  consider  a  network  of  N  nodes  numbered 
from  1  to  N.  The  link  from  1  to  J  is  assumed  to  be 
available  for  service  with  probability  Pjj»  and  availa¬ 
bilities  for  the  various  Links  are  assumed  to  be  Independent. 
Consequently,  we  are  Involved  In  a  stochastic  situation. 

We  firBt  show  how  to  determine  which  paths  from  i  to  N 
have  greatest  probability  of  being  available  for  service,  and 
then  indicate  how  the  second,  third,  and  other  greatest  paths 
can  be  calculated.  This  problem  is  an  important  one  in  tele¬ 
phony  where  one  point  in  a  switching  network  can  be  connected  to 
another  via  several  different  paths;  if  the  most  likely  paths 
are  unavailable,  then  the  second  most  likely  may  be  scanned,  etc. 
Let 

(4.1)  Uj  =  the  probability  that  a  path  from  1  to  N  is 

available  for  service,  the  path  being  an  optimal 
one . 

Once  again,  upon  employing  the  principle  of  optimality  j^lj,  we 
have 

(4.2)  u.  =  max  p,  .u,,  l  =■  1,2,.  ..,N  -  1. 

1  JA  1J  J 

These  equations  can  be  resolved  using  successive  approximations, 
and,  as  before,  the  method  is  a  method  of  exhaustion;  i.e.,  it 
converges  after  a  finite  number  of  steps  bounded  in  advance. 

Next,  let  us  introduce 
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(4.3) 


the  probability  that  a  second  Dest  path  is 
available  for  service. 


Then  we  have 


(*M) 


max  p  .v 
j/1  1J  j 

v.  =  max/  ,  1  -  1,2,...,N  —  1. 

1  maxQ  p,  ,u 
\  1  J  J 


The  solution  may  now  be  obtained  by  successive  approxi¬ 
mations  . 


_ .  tochaatlc  Decision  Processes 

Let  us  return  to  the  deterministic  decision  process  dis¬ 
cussed  In  Lection  3.  We  wish  to  modify  this  process,  though. 

In  that  we  shall  now  assume  that  the  result  of  making  decision 
1,  with  the  system  in  state  x.  Is  no  longer  precisely  known. 
Tn  its  place,  we  merely  know  that  there  is  a  certain  probability 
that  the  system  is  transformed  Into  the  state  y,  which  we  de¬ 
note  ny  dG(y;x,l).  The  objective  of  the  process  will  now 
become  that  of  maximizing  the  expected  value  of  a  given 
function  g(x)  of  the  final  state. 

Let 

(b.l)  fN(x;  expected  value  of  the  return  from  an 

N- stage  process,  beginning  in  state  x,  and 
using  an  optimal  policy. 

For  a  one-stage  process,  we  find 
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(y.2) 


fx(x) 


max 

1 


y:g(y )dO(y;x,l ), 


where  the  Integration  Is  over  all  states  y.  For  the  N-stage 
process,  we  have 


(5.3) 


fN(x) 


max 

1 


u1  rn-i^y  )dG(y^x»1)» 


N 


1,2, 


For  the  determination  of  suooptimal  policies,  we  Introduce 
the  functions 

(5.4)  f^k^(x)  -=  the  expected  value  of  the  return  from  an 

N-stage  process,  beginning  In  state  x, 
and  using  a  k-th  best  policy. 


Using  the  same  reasoning  as  earlier,  we  are  led  to  the  formulas 


( 


(5-5)  fN^(x)  =  maxk< 


/  ^(y)d0(y;x,l ), 

jn  ^/^(y  )dO(y;x,l), 


ur  )dQ(y  ;x,l ), 


(5.6)  f^k^(x)  .  maxk  (/)  g(y  )d0(y ;  x,  1 ) , 


which  permit  the  recursive  determination  of  the  functions  de¬ 
fined  in  Equation  (5-4)  along  with  the  appropriate  policies. 
In  effect,  we  have  to  compute  a  sequence  of  functions  of  the 
variable,  x,  which  represents  a  basic  simplification  If  x 
is  of  dimension  three  or  Less. 
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6.  Sencltlvlty  Analysis 

Let  ua  now  explain  the  significance  of  the  foregoing  results 
In  connection  with  the  actual  solution  of  physical  problems. 

As  we  know,  whenever  we  construct  a  mathematical  model  of  a 
real  situation,  we  make  certain  compromises,  or  approximations, 
as  they  are  more  diplomatically  called.  It  follows  that  an 
optimal  solution  to  a  mathematical  proolem  may  not  be  an  optimal 
solution  to  the  engineering  or  economic  problem  under  considera¬ 
tion.  There  are  now  two  alternatives.  We  can  either  complicate 
the  mathematical  model  to  remove  this  difficulty,  or  we  can  look 
for  approximate  solutions  of  the  mathematical  problem  which  more 
nearly  solve  the  physical  proolem.  <Vhich  step  we  take  depends 
upon  the  available  time,  the  coot,  the  utility  of  improved 
solutions,  and  so  on. 

If  we  are  Interested  in  finding  approximate  solutions  of  the 
mathematical  proDlem,  then  the  foregoing  techniques  are  useful. 

In  somewhat  the  same  context,  wp  are  frequently  forced,  be¬ 
cause  o*'  the  limited  memories  of  computers  and  their  slowness  of 
computation,  to  use  much  coarser  grids  in  Doth  space  and  time 
than  we  would  like.  lometlmes,  we  ire  forced  to  retain  this  type 
of  solution  for  want  of  better,  while,  occasionally,  we  can  use 
these  crude  solutions  as  Initial  approximations  to  be  successively 
improved . 

One  way  of  evaluating  the  meant ng fulness  of  a  coarse  approxi¬ 
mation  is  to  examine  the  nehavlor  of  the  neighborhood  cl  the 
optimal  policy.  I!  there  Is  ton  drast’c  a  change,  we  can  be 
assured  that  the  •’ormulatlon  is  too  crude.  If  the  change  Is 


P- 1417 
12 


slight  as  we  go  from  optimal  to  second  best,  from  second  best 
to  third  best,  and  ao  on,  then  there  Is  a  chance  that  we  are 
getting  worthwhile  results. 

The  numerical  solution  of  any  physical  problem  must 
always  be  subjected  to  a  stability,  or  sensitivity,  analysis 
cf  this  type. 
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